Goto

Collaborating Authors

 accelerate training


AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Neural Information Processing Systems

Deep neural networks have yielded superior performance in many contemporary applications. However, the gradient computation in a deep model with millions of instances leads to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount of improvement by a stochastic gradient update varies dynamically with the choice of instances in the mini-batch. In AutoAssist, we utilize this fact and design an instance shrinking operation that is used to filter out instances with relatively low marginal improvement to the current model; thus the computationally intensive gradient computations are performed on informative instances as much as possible. Specifically, we train a very lightweight Assistant model jointly with the original deep network, which we refer to as Boss.


AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Neural Information Processing Systems

Deep neural networks have yielded superior performance in many contemporary applications. However, the gradient computation in a deep model with millions of instances leads to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount of improvement by a stochastic gradient update varies dynamically with the choice of instances in the mini-batch. In AutoAssist, we utilize this fact and design an instance shrinking operation that is used to filter out instances with relatively low marginal improvement to the current model; thus the computationally intensive gradient computations are performed on informative instances as much as possible. Specifically, we train a very lightweight Assistant model jointly with the original deep network, which we refer to as Boss.


Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Neural Information Processing Systems

By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time.


Reviews: AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Neural Information Processing Systems

The theoretical study of instance shrinkage in pegasos is as far as I know novel and interesting. Specially interesting is how instance shrinkage does not affect the solution the model converges to, which justifies later experiments which ignore importance sampling in deep nets. Similarly, the idea of training a small assistant model just to predict the loss of the base model on unseen examples is straightforward and potentially useful. The algorithm is clearly described, including all hyperparameters, and it does look like it should be possible to replicate the experiments. It's unclear from reading the experimental section, however, that this algorithm is actually an improvement over just regular training with no curriculum attached.


Reviews: AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Neural Information Processing Systems

This paper addresses an important problem and the empirical results look promising. The method is simple and clearly presented. For making this work more convincing, as pointed out by the reviewers, it would be nice to add tuned SGD/momentum baseline, and have a thorough discussion with related work.


Reviews: Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Neural Information Processing Systems

The suggested reparametrisation and its theoretical analysis are very interesting and I enjoyed reading the paper. However, some points in the theoretical analysis could be improved: The paper argues that the new parametrisation improves the conditioning matrix of the gradient, but neither a strong theoretical argument nor a empirical demonstration for this are given. In line 127 it is said "Empirically, we find that w is often (close to) a dominant eigenvector of the covariance matrix C", but the correspond experiments are neither shown in the paper nor in the supplemental material. In line 122/123 the authors claim "It has been observed that neural networks with batch normalization also have this property (to be relatively insensitive to different learning rates), which can be explained by this analysis.". However, it did not became clear to me, how the analysis of the previous sections can be directly transferred to batch normalisation.


AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Neural Information Processing Systems

Deep neural networks have yielded superior performance in many contemporary applications. However, the gradient computation in a deep model with millions of instances leads to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount of improvement by a stochastic gradient update varies dynamically with the choice of instances in the mini-batch. In AutoAssist, we utilize this fact and design an instance shrinking operation that is used to filter out instances with relatively low marginal improvement to the current model; thus the computationally intensive gradient computations are performed on informative instances as much as possible. Specifically, we train a very lightweight Assistant model jointly with the original deep network, which we refer to as Boss.


Reduce deep learning training time and cost with MosaicML Composer on AWS

#artificialintelligence

In the past decade, we have seen Deep learning (DL) science adopted at a tremendous pace by AWS customers. The plentiful and jointly trained parameters of DL models have a large representational capacity that brought improvements in numerous customer use cases, including image and speech analysis, natural language processing (NLP), time series processing, and more. In this post, we highlight challenges commonly reported specifically in DL training, and how the open-source library MosaicML Composer helps solve them. DL models are trained iteratively, in a nested for loop. A loop iterates through the training dataset chunk by chunk and, if necessary, this loop is repeated several times over the whole dataset.


59th MDW: Alamo Spark Cell drives innovation throughout the Air Force

#artificialintelligence

Throughout the Air Force, teams referred to as Spark Cells serve as a hub for innovation. The 59th Training Group's Alamo Spark Cell is a collaborative team that focuses on improving training at the Medical Education and Training Campus. "Our Spark Cell team works with the whole campus here and also works with the Air Force Medical Modeling and Simulation Training at Randolph," said Tech. "We have every person we can get involved within the campus, and we brainstorm ideas. We ask ourselves, how can we innovate and accelerate training?" Even during the pandemic, these innovators have implemented new ideas to help improve their students' education.


AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Zhang, Jiong, Yu, Hsiang-Fu, Dhillon, Inderjit S.

Neural Information Processing Systems

Deep neural networks have yielded superior performance in many contemporary applications. However, the gradient computation in a deep model with millions of instances leads to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount of improvement by a stochastic gradient update varies dynamically with the choice of instances in the mini-batch. In AutoAssist, we utilize this fact and design an instance shrinking operation that is used to filter out instances with relatively low marginal improvement to the current model; thus the computationally intensive gradient computations are performed on informative instances as much as possible.